708 research outputs found

    Spectral classification of short numerical exon and intron sequences

    Get PDF
    This research presents three new numerical representations for classifying short exon and intron sequences using discrete Fourier transform period-3 value. Based on the human genome, results indicate that the Complex Twin-Pair representation is attractive compared with other numerical representations and the approach has potential applications in genome annotation and read mapping

    Integrating diverse databases into an unified analysis framework: a Galaxy approach

    Get PDF
    Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources

    BigWig and BigBed: enabling browsing of large distributed datasets

    Get PDF
    Summary: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets

    HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants

    Get PDF
    The resolution of genome-wide association studies (GWAS) is limited by the linkage disequilibrium (LD) structure of the population being studied. Selecting the most likely causal variants within an LD block is relatively straightforward within coding sequence, but is more difficult when all variants are intergenic. Predicting functional non-coding sequence has been recently facilitated by the availability of conservation and epigenomic information. We present HaploReg, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state in nine cell types, conservation across mammals and their effect on regulatory motifs. Sets of SNPs, such as those resulting from GWAS, are analyzed for an enrichment of cell type-specific enhancers. HaploReg will be useful to researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. The HaploReg database is available at http://compbio.mit.edu/HaploReg.National Institutes of Health (U.S.) (R01-HG004037)National Institutes of Health (U.S.) (RC1-HG005334)National Science Foundation (U.S.) (HG005334

    The UCSC Genome Browser database: update 2010

    Get PDF
    The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools

    Dietary soy and meat proteins induce distinct physiological and gene expression changes in rats

    Get PDF
    This study reports on a comprehensive comparison of the effects of soy and meat proteins given at the recommended level on physiological markers of metabolic syndrome and the hepatic transcriptome. Male rats were fed semi-synthetic diets for 1 wk that differed only regarding protein source, with casein serving as reference. Body weight gain and adipose tissue mass were significantly reduced by soy but not meat proteins. The insulin resistance index was improved by soy, and to a lesser extent by meat proteins. Liver triacylglycerol contents were reduced by both protein sources, which coincided with increased plasma triacylglycerol concentrations. Both soy and meat proteins changed plasma amino acid patterns. The expression of 1571 and 1369 genes were altered by soy and meat proteins respectively. Functional classification revealed that lipid, energy and amino acid metabolic pathways, as well as insulin signaling pathways were regulated differently by soy and meat proteins. Several transcriptional regulators, including NFE2L2, ATF4, Srebf1 and Rictor were identified as potential key upstream regulators. These results suggest that soy and meat proteins induce distinct physiological and gene expression responses in rats and provide novel evidence and suggestions for the health effects of different protein sources in human diets

    The UCSC Genome Browser Database: update 2009

    Get PDF
    The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs

    The UCSC Archaeal Genome Browser: 2012 update

    Get PDF
    The UCSC Archaeal Genome Browser (http://archaea.ucsc.edu) offers a graphical web-based resource for exploration and discovery within archaeal and other selected microbial genomes. By bringing together existing gene annotations, gene expression data, multiple-genome alignments, pre-computed sequence comparisons and other specialized analysis tracks, the genome browser is a powerful aggregator of varied genomic information. The genome browser environment maintains the current look-and-feel of the vertebrate UCSC Genome Browser, but also integrates archaeal and bacterial-specific tracks with a few graphic display enhancements. The browser currently contains 115 archaeal genomes, plus 31 genomes of viruses known to infect archaea. Some of the recently developed or enhanced tracks visualize data from published high-throughput RNA-sequencing studies, the NCBI Conserved Domain Database, sequences from pre-genome sequencing studies, predicted gene boundaries from three different protein gene prediction algorithms, tRNAscan-SE gene predictions with RNA secondary structures and CRISPR locus predictions. We have also developed a companion resource, the Archaeal COG Browser, to provide better search and display of arCOG gene function classifications, including their phylogenetic distribution among available archaeal genomes

    Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses

    Get PDF
    It is well known that the dinucleotide CpG is under-represented in the genomic DNA of many vertebrates. This is commonly thought to be due to the methylation of cytosine residues in this dinucleotide and the corresponding high rate of deamination of 5-methycytosine, which lowers the frequency of this dinucleotide in DNA. Surprisingly, many single-stranded RNA viruses that replicate in these vertebrate hosts also have a very low presence of CpG dinucleotides in their genomes. Viruses are obligate intracellular parasites and the evolution of a virus is inexorably linked to the nature and fate of its host. One therefore expects that virus and host genomes should have common features. In this work, we compare evolutionary patterns in the genomes of ssRNA viruses and their hosts. In particular, we have analyzed dinucleotide patterns and found that the same patterns are pervasively over- or under-represented in many RNA viruses and their hosts suggesting that many RNA viruses evolve by mimicking some of the features of their host's genes (DNA) and likely also their corresponding mRNAs. When a virus crosses a species barrier into a different host, the pressure to replicate, survive and adapt, leaves a footprint in dinucleotide frequencies. For instance, since human genes seem to be under higher pressure to eliminate CpG dinucleotide motifs than avian genes, this pressure might be reflected in the genomes of human viruses (DNA and RNA viruses) when compared to those of the same viruses replicating in avian hosts. To test this idea we have analyzed the evolution of the influenza virus since 1918. We find that the influenza A virus, which originated from an avian reservoir and has been replicating in humans over many generations, evolves in a direction strongly selected to reduce the frequency of CpG dinucleotides in its genome. Consistent with this observation, we find that the influenza B virus, which has spent much more time in the human population, has adapted to its human host and exhibits an extremely low CpG dinucleotide content. We believe that these observations directly show that the evolution of RNA viral genomes can be shaped by pressures observed in the host genome. As a possible explanation, we suggest that the strong selection pressures acting on these RNA viruses are most likely related to the innate immune response and to nucleotide motifs in the host DNA and RNAs

    A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

    Get PDF
    GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al
    corecore